precision level
P$^2$U: Progressive Precision Update For Efficient Model Distribution
Afrabandpey, Homayun, Tavakoli, Hamed Rezazadegan
Efficient model distribution is becoming increasingly critical in bandwidth-constrained environments. In this paper, we propose a simple yet effective approach called Progressive Precision Update (P$^2$U) to address this problem. Instead of transmitting the original high-precision model, P$^2$U transmits a lower-bit precision model, coupled with a model update representing the difference between the original high-precision model and the transmitted low precision version. With extensive experiments on various model architectures, ranging from small models ($1 - 6$ million parameters) to a large model (more than $100$ million parameters) and using three different data sets, e.g., chest X-Ray, PASCAL-VOC, and CIFAR-100, we demonstrate that P$^2$U consistently achieves better tradeoff between accuracy, bandwidth usage and latency. Moreover, we show that when bandwidth or startup time is the priority, aggressive quantization (e.g., 4-bit) can be used without severely compromising performance. These results establish P$^2$U as an effective and practical solution for scalable and efficient model distribution in low-resource settings, including federated learning, edge computing, and IoT deployments. Given that P$^2$U complements existing compression techniques and can be implemented alongside any compression method, e.g., sparsification, quantization, pruning, etc., the potential for improvement is even greater.
QSViT: A Methodology for Quantizing Spiking Vision Transformers
Putra, Rachmad Vidya Wicaksana, Iftikhar, Saad, Shafique, Muhammad
Vision Transformer (ViT)-based models have shown state-of-the-art performance (e.g., accuracy) in vision-based AI tasks. However, realizing their capability in resource-constrained embedded AI systems is challenging due to their inherent large memory footprints and complex computations, thereby incurring high power/energy consumption. Recently, Spiking Vision Transformer (SViT)-based models have emerged as alternate low-power ViT networks. However, their large memory footprints still hinder their applicability for resource-constrained embedded AI systems. Therefore, there is a need for a methodology to compress SViT models without degrading the accuracy significantly. To address this, we propose QSViT, a novel design methodology to compress the SViT models through a systematic quantization strategy across different network layers. To do this, our QSViT employs several key steps: (1) investigating the impact of different precision levels in different network layers, (2) identifying the appropriate base quantization settings for guiding bit precision reduction, (3) performing a guided quantization strategy based on the base settings to select the appropriate quantization setting, and (4) developing an efficient quantized network based on the selected quantization setting. The experimental results demonstrate that, our QSViT methodology achieves 22.75% memory saving and 21.33% power saving, while also maintaining high accuracy within 2.1% from that of the original non-quantized SViT model on the ImageNet dataset. These results highlight the potential of QSViT methodology to pave the way toward the efficient SViT deployments on resource-constrained embedded AI systems.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Italy > Lazio > Rome (0.04)
RAG-based User Profiling for Precision Planning in Mixed-precision Over-the-Air Federated Learning
Yuan, Jinsheng, Tang, Yun, Guo, Weisi
Mixed-precision computing, a widely applied technique in AI, offers a larger trade-off space between accuracy and efficiency. The recent purposed Mixed-Precision Over-the-Air Federated Learning (MP-OTA-FL) enables clients to operate at appropriate precision levels based on their heterogeneous hardware, taking advantages of the larger trade-off space while covering the quantization overheads in the mixed-precision modulation scheme for the OTA aggregation process. A key to further exploring the potential of the MP-OTA-FL framework is the optimization of client precision levels. The choice of precision level hinges on multifaceted factors including hardware capability, potential client contribution, and user satisfaction, among which factors can be difficult to define or quantify. In this paper, we propose a RAG-based User Profiling for precision planning framework that integrates retrieval-augmented LLMs and dynamic client profiling to optimize satisfaction and contributions. This includes a hybrid interface for gathering device/user insights and an RAG database storing historical quantization decisions with feedback. Experiments show that our method boosts satisfaction, energy savings, and global model accuracy in MP-OTA-FL systems.
- North America > United States (0.05)
- Europe > United Kingdom (0.04)
Efficient Methods for Overlapping Group Lasso
The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- North America > United States > New York (0.04)
- Europe > France (0.04)
Neural Precision Polarization: Simplifying Neural Network Inference with Dual-Level Precision
Jayasuriya, Dinithi, Darabi, Nastaran, Hashem, Maeesha Binte, Trivedi, Amit Ranjan
We introduce a precision polarization scheme for DNN inference that utilizes only very low and very high precision levels, assigning low precision to the majority of network weights and activations while reserving high precision paths for targeted error compensation. This separation allows for distinct optimization of each precision level, thereby reducing memory and computation demands without compromising model accuracy. In the discussed approach, a floating-point model can be trained in the cloud and then downloaded to an edge device, where network weights and activations are directly quantized to meet the edge devices' desired level, such as NF4 or INT8. To address accuracy loss from quantization, surrogate paths are introduced, leveraging low-rank approximations on a layer-by-layer basis. These paths are trained with a sensitivity-based metric on minimal training data to recover accuracy loss under quantization as well as due to process variability, such as when the main prediction path is implemented using analog acceleration. Our simulation results show that neural precision polarization enables approximately 464 TOPS per Watt MAC efficiency and reliability by integrating rank-8 error recovery paths with highly efficient, though potentially unreliable, bit plane-wise compute-in-memory processing.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Colorado (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
SNN4Agents: A Framework for Developing Energy-Efficient Embodied Spiking Neural Networks for Autonomous Agents
Putra, Rachmad Vidya Wicaksana, Marchisio, Alberto, Shafique, Muhammad
Recent trends have shown that autonomous agents, such as Autonomous Ground Vehicles (AGVs), Unmanned Aerial Vehicles (UAVs), and mobile robots, effectively improve human productivity in solving diverse tasks. However, since these agents are typically powered by portable batteries, they require extremely low power/energy consumption to operate in a long lifespan. To solve this challenge, neuromorphic computing has emerged as a promising solution, where bio-inspired Spiking Neural Networks (SNNs) use spikes from event-based cameras or data conversion pre-processing to perform sparse computations efficiently. However, the studies of SNN deployments for autonomous agents are still at an early stage. Hence, the optimization stages for enabling efficient embodied SNN deployments for autonomous agents have not been defined systematically. Toward this, we propose a novel framework called SNN4Agents that consists of a set of optimization techniques for designing energy-efficient embodied SNNs targeting autonomous agent applications. Our SNN4Agents employs weight quantization, timestep reduction, and attention window reduction to jointly improve the energy efficiency, reduce the memory footprint, optimize the processing latency, while maintaining high accuracy. In the evaluation, we investigate use cases of event-based car recognition, and explore the trade-offs among accuracy, latency, memory, and energy consumption. The experimental results show that our proposed framework can maintain high accuracy (i.e., 84.12% accuracy) with 68.75% memory saving, 3.58x speed-up, and 4.03x energy efficiency improvement as compared to the state-of-the-art work for NCARS dataset. In this manner, our SNN4Agents framework paves the way toward enabling energy-efficient embodied SNN deployments for autonomous agents.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Energy (0.56)
- Information Technology > Robotics & Automation (0.34)
Mixed-Precision Over-The-Air Federated Learning via Approximated Computing
Yuan, Jinsheng, Wei, Zhuangkun, Guo, Weisi
Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (AxC) where bit precisions are adjusted for energy and computational efficiency. The dynamic distribution of bit precision updates amongst FL clients poses an open challenge for OTA-FL, as is is incompatible in the wireless modulation superposition space. Here, we propose an AxC-based OTA-FL framework of clients with multiple precisions, demonstrating the following innovations: (i) optimize the quantization-performance trade-off for both server and clients within the constraints of varying edge computing capabilities and learning accuracy requirements, and (ii) develop heterogeneous gradient resolution OTA-FL modulation schemes to ensure compatibility with physical layer OTA aggregation. Our findings indicate that we can design modulation schemes that enable AxC based OTA-FL, which can achieve 50\% faster and smoother server convergence and a performance enhancement for the lowest precision clients compared to a homogeneous precision approach. This demonstrates the great potential of our AxC-based OTA-FL approach in heterogeneous edge computing environments.
- Europe > United Kingdom (0.05)
- North America > United States > New York > New York County > New York City (0.04)
Quantifying the Capabilities of LLMs across Scale and Precision
Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Nova Scotia (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Efficient Methods for Overlapping Group Lasso
The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- North America > United States > New York (0.04)
- Europe > France (0.04)
DisGNet: A Distance Graph Neural Network for Forward Kinematics Learning of Gough-Stewart Platform
Zhu, Huizhi, Xu, Wenxia, Huang, Jian, Li, Jiaxin
In this paper, we propose a graph neural network, DisGNet, for learning the graph distance matrix to address the forward kinematics problem of the Gough-Stewart platform. DisGNet employs the k-FWL algorithm for message-passing, providing high expressiveness with a small parameter count, making it suitable for practical deployment. Additionally, we introduce the GPU-friendly Newton-Raphson method, an efficient parallelized optimization method executed on the GPU to refine DisGNet's output poses, achieving ultra-high-precision pose. This novel two-stage approach delivers ultra-high precision output while meeting real-time requirements. Our results indicate that on our dataset, DisGNet can achieves error accuracys below 1mm and 1deg at 79.8\% and 98.2\%, respectively. As executed on a GPU, our two-stage method can ensure the requirement for real-time computation. Codes are released at https://github.com/FLAMEZZ5201/DisGNet.
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)